1z0-449: Oracle Big Data 2016 Implementation Essentials

The NoSQL KVStore experiences a node failure. One of the replicas is promoted to primary.

How will the NoSQL client that accesses the store know that there has been a change in the architecture?

The KVLite utility updates the NoSQL client with the status of the master and replica.
KVStoreConfig sends the status of the master and replica to the NoSQL client.
The NoSQL admin agent updates the NoSQL client with the status of the master and replica.
The Shard State Table (SST) contains information about each shard and the master and replica status for the shard.

Correct answer: D

Explanation:

Given a shard, the Client Driver next consults the Shard State Table (SST). For each shard, the SST contains information about each replication node comprising the group (step 5). Based upon information in the SST, such as the identity of the master and the load on the various nodes in a shard, the Client Driver selects the node to which to send the request and forwards the request to the appropriate node. In this case, since we are issuing a write operation, the request must go to the master node. Note: If the machine hosting the master should fail in any way, then the master automatically fails over to one of the other nodes in the shard. That is, one of the replica nodes is automatically promoted to master. References: http://www.oracle.com/technetwork/products/nosqldb/learnmore/nosql-wp-1436762.pdf

Given a shard, the Client Driver next consults the Shard State Table (SST). For each shard, the SST contains information about each replication node comprising the group (step 5). Based upon information in the SST, such as the identity of the master and the load on the various nodes in a shard, the Client Driver selects the node to which to send the request and forwards the request to the appropriate node. In this case, since we are issuing a write operation, the request must go to the master node.

Note: If the machine hosting the master should fail in any way, then the master automatically fails over to one of the other nodes in the shard. That is, one of the replica nodes is automatically promoted to master.

References: http://www.oracle.com/technetwork/products/nosqldb/learnmore/nosql-wp-1436762.pdf

Your customer is experiencing significant degradation in the performance of Hive queries. The customer wants to continue using SQL as the main query language for the HDFS store.

Which option can the customer use to improve performance?

native MapReduce Java programs
Impala
HiveFastQL
Apache Grunt

Correct answer: B

Explanation:

Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. References: https://en.wikipedia.org/wiki/Cloudera_Impala

Cloudera Impala is Cloudera's open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.

Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation.

References: https://en.wikipedia.org/wiki/Cloudera_Impala

Your customer keeps getting an error when writing a key/value pair to a NoSQL replica.

What is causing the error?

The master may be in read-only mode and as result, writes to replicas are not being allowed.
The replica may be out of sync with the master and is not able to maintain consistency.
The writes must be done to the master.
The replica is in read-only mode.
The data file for the replica is corrupt.

Correct answer: C

Explanation:

Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and which copies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-only operations. Note: Oracle NoSQL Database provides multi-terabyte distributed key/value pair storage that offers scalable throughput and performance. That is, it services network requests to store and retrieve data which is organized into key-value pairs. References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html

Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and which copies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-only operations.

Note: Oracle NoSQL Database provides multi-terabyte distributed key/value pair storage that offers scalable throughput and performance. That is, it services network requests to store and retrieve data which is organized into key-value pairs.

References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html

The log data for your customer's Apache web server has seven string columns.

What is the correct command to load the log data from the file 'sample.log' into a new Hive table LOGS that does not currently exist?

hive> CREATE TABLE logs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ' ';
hive> create table logs as select * from sample.log;
hive> CREATE TABLE logs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ' ';
hive> LOAD DATA LOCAL INPATH 'sample.log' OVERWRITE INTO TABLE logs;
hive> LOAD DATA LOCAL INPATH 'sample.log' OVERWRITE INTO TABLE logs;
hive> CREATE TABLE logs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ' ';
hive> create table logs as load sample.1og from hadoop;

Correct answer: C

Explanation:

The CREATE TABLE command creates a table with the given name. Load files into existing tables with the LOAD DATA command. References: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromquerieshttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

The CREATE TABLE command creates a table with the given name.

Load files into existing tables with the LOAD DATA command.

References: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Your customer’s Oracle NoSQL store has a replication factor of 3. One of the customer’s replica nodes goes down.

What will be the long-term performance impact on the customer’s NoSQL database if the node is replaced?

There will be no performance impact.
The database read performance will be impacted.
The database read and write performance will be impacted.
The database will be unavailable for reading or writing.
The database write performance will be impacted.

Correct answer: C

Explanation:

The number of nodes belonging to a shard is called its Replication Factor. The larger a shard's Replication Factor, the faster its read throughput (because there are more machines to service the read requests) but the slower its write performance (because there are more machines to which writes must be copied). Note: Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and which copies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-only operations. References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html#replicationfactor

The number of nodes belonging to a shard is called its Replication Factor. The larger a shard's Replication Factor, the faster its read throughput (because there are more machines to service the read requests) but the slower its write performance (because there are more machines to which writes must be copied).

Note: Replication Nodes are organized into shards. A shard contains a single Replication Node which is responsible for performing database writes, and which copies those writes to the other Replication Nodes in the shard. This is called the master node. All other Replication Nodes in the shard are used to service read-only operations.

References: https://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/introduction.html#replicationfactor

Your customer is using the IKM SQL to HDFS File (Sqoop) module to move data from Oracle to HDFS. However, the customer is experiencing performance issues.

What change should you make to the default configuration to improve performance?

Change the ODI configuration to high performance mode.
Increase the number of Sqoop mappers.
Add additional tables.
Change the HDFS server I/O settings to duplex mode.

Correct answer: B

Explanation:

Controlling the amount of parallelism that Sqoop will use to transfer data is the main way to control the load on your database. Using more mappers will lead to a higher number of concurrent data transfer tasks, which can result in faster job completion. However, it will also increase the load on the database as Sqoop will execute more concurrent queries. References: https://community.hortonworks.com/articles/70258/sqoop-performance-tuning.html

Controlling the amount of parallelism that Sqoop will use to transfer data is the main way to control the load on your database. Using more mappers will lead to a higher number of concurrent data transfer tasks, which can result in faster job completion. However, it will also increase the load on the database as Sqoop will execute more concurrent queries.

References: https://community.hortonworks.com/articles/70258/sqoop-performance-tuning.html

What is the result when a flume event occurs for the following single node configuration?

The event is written to memory.
The event is logged to the screen.
The event output is not defined in this section.
The event is sent out on port 44444.
The event is written to the netcat process.

Correct answer: B

Explanation:

This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. Note: A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. The destination of the sink might be another agent or the central stores. A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of Flume events. Incorrect Answers:D: port 4444 is part of the source, not the sink.References: https://flume.apache.org/FlumeUserGuide.html

This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console.

Note:

A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. The destination of the sink might be another agent or the central stores.
A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of Flume events.

Incorrect Answers:

D: port 4444 is part of the source, not the sink.

References: https://flume.apache.org/FlumeUserGuide.html

What kind of workload is MapReduce designed to handle?

batch processing
interactive
computational
real time
commodity

Correct answer: A

Explanation:

Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It will take time as per Configuration of system,namenode,task-cker,job-tracker etc. References: https://www.quora.com/What-is-batch-processing-in-hadoop

Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It will take time as per Configuration of system,namenode,task-cker,job-tracker etc.

References: https://www.quora.com/What-is-batch-processing-in-hadoop

Your customer uses LDAP for centralized user/group management.

How will you integrate permissions management for the customer’s Big Data Appliance into the existing architecture?

Make Oracle Identity Management for Big Data the single source of truth and point LDAP to its keystore for user lookup.
Enable Oracle Identity Management for Big Data and point its keystore to the LDAP directory for user lookup.
Make Kerberos the single source of truth and have LDAP use the Key Distribution Center for user lookup.
Enable Kerberos and have the Key Distribution Center use the LDAP directory for user lookup.

Correct answer: D

Explanation:

Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository. The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which will then link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket. References: https://www.rittmanmead.com/blog/2015/04/setting-up-security-and-access-control-on-a-big-data-appliance/

Kerberos integrates with LDAP servers – allowing the principals and encryption keys to be stored in the common repository.

The complication with Kerberos authentication is that your organization needs to have a Kerberos KDC (Key Distribution Center) server setup already, which will then link to your corporate LDAP or Active Directory service to check user credentials when they request a Kerberos ticket.

References: https://www.rittmanmead.com/blog/2015/04/setting-up-security-and-access-control-on-a-big-data-appliance/

Your customer collects diagnostic data from its storage systems that are deployed at customer sites. The customer needs to capture and process this data by country in batches.

Why should the customer choose Hadoop to process this data?

Hadoop processes data on large clusters (10-50 max) on commodity hardware.
Hadoop is a batch data processing architecture.
Hadoop supports centralized computing of large data sets on large clusters.
Node failures can be dealt with by configuring failover with clusterware.
Hadoop processes data serially.

Correct answer: B

Explanation:

Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It will take time as per Configuration of system,namenode,task-tracker,job-tracker etc. Incorrect Answers:A: Yahoo! has by far the most number of nodes in its massive Hadoop clusters at over 42,000 nodes as of July 2011. C: Hadoop supports distributed computing of large data sets on large clustersE: Hadoop processes data in parallel.References: https://www.quora.com/What-is-batch-processing-in-hadoop

Hadoop was designed for batch processing. That means, take a large dataset in input all at once, process it, and write a large output. The very concept of MapReduce is geared towards batch and not real-time. With growing data, Hadoop enables you to horizontally scale your cluster by adding commodity nodes and thus keep up with query. In hadoop Map-reduce does the same job it will take large amount of data and process it in batch. It will not give immediate output. It will take time as per Configuration of system,namenode,task-tracker,job-tracker etc.

Incorrect Answers:

A: Yahoo! has by far the most number of nodes in its massive Hadoop clusters at over 42,000 nodes as of July 2011.

C: Hadoop supports distributed computing of large data sets on large clusters

E: Hadoop processes data in parallel.

References: https://www.quora.com/What-is-batch-processing-in-hadoop

Vendor:	Oracle
Exam Code:	1z0-449
Exam Name:	Oracle Big Data 2016 Implementation Essentials
Date:	Jul 24, 2018
File Size:	732 KB

Download Oracle Big Data 2016 Implementation Essentials.1z0-449.CertDumps.2018-07-24.38q.vcex

How to open VCEX files?

Demo Questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

ProfExam at a 20% markdown